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(54) Propagating updates efficiently in hierarchically structured data under a push model 



(57) One embodiment of the present invention pro- 
vides a system that efficiently propagates changes in 
hierarchically organized data to remotely cached copies 
of the data. The system operates by receiving changes 
(402) to the data located on the server (204), and apply- 
ing (404) the changes to the data on the server (204). 
These changes are propagated to remotely cached 
copies of the data on a client (206. 208, 210. 212) in 
response to an event on a server, and independently of 
events on the client, by (1) determining differences 
(406) between the cun'ent version of the data at the 
server and an older copy of the data at the client, which 
the server has stored locally; (2) using the differences to 
construct (408) an update for the copy of the data, 



which may include node insertion and node deletion 
operations for hierarchically organized nodes in the 
data; and (3) sending (410) the update to the client 
where the update is applied to the copy of the data to 
produce an updated copy of the data. According to one 
aspect of the present invention, the act of determining 
differences, and the act of using the differences to con- 
struct the update both take place during a single pass 
through the data. According to another aspect of the 
present invention, the update for the copy of the data 
may include node copy, node move, node collapse, 
node split, node swap and node update operations. 
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[0001] The present invention relates to distributed 
computing systems and databases. More particularly, 
the present Invention relates to a method and an appa- 
ratus that facilitates detecting changes in hierarchically 
structured data and producing corresponding updates 
for remote copies of the hierarchically structured data. 
[0002] The advent of the Internet has led to the 
development of web browsers that allow a user to navi- 
gate through inter-linked pages of textual data and 
graphical images distributed across geographically dis- 
tributed web servers. Unfortunately, as the Internet 
becomes increasingly popular, the Internet often experi- 
ences so much use that accesses from web browsers to 
web servers often slow to a crawl. 
[0003] In order to alleviate this problem, a copy of a 
portion of a web document from a web server (docu- 
ment server) can be cached on a client computer sys- 
tem, or alternatively, on an intermediate proxy server, so 
that an access to the portion of the document does not 
have to travel all the way back to the document server. 
Instead, the access can be serviced from a cached copy 
of the portion of the document located on the local com- 
puter system or on the proxy server. 
[0004] However, if the data on the document server 
is frequently updated, these updates must propagate to 
the cached copies on proxy servers and client computer 
systems. Such updates are presently propagated by 
simply sending a new copy of the data to the proxy serv- 
ers and client computer systems. However, this tech- 
nique is oft^ inefficient because most of the data in the 
new copy is typically the same as the data in the cached 
copy. In this case, it would be more efficient to simply 
send changes to the data instead of sending a complete 
copy of the data. 

[0005] This is particularly true when the changes to 
the data involve simple manipulations in hierarchically 
structured data. Hierarchically structured data typically 
includes a collection of nodes containing data in a 
number of forms including textual data, database 
records, graphical data, and audio data. These nodes 
are typically inter-linked by pointers (or some other type 
of linkage) into a hierarchical structure, which has 
nodes that are subordinate to other nodes, such as a 
tree although other types of linkages are possible. 
[0006] Manipulations of hierarchically structured 
data may take the form of operations on nodes, such as 
node insertions, node deletions or node movements. 
Although such operations can be succinctly stated and 
easily performed, there presently exists no mechanism 
to transmit such operations to update copies of the hier- 
archically structured data. Instead, existing systems first 
apply the operations to the data, and then transmit the 
data across the network to update copies of the data on 
local machines and proxy servers. 



[0007] One embodiment of the present invention 
provides a system that efficiently propagates changes 

5 in hierarchically organized data to remotely cached cop- 
ies of the data. The system operates by receiving 
changes to the data located on the server, and applying 
the changes to the data on the server. These changes 
are propagated to remotely cached copies of the data in 

10 response to an event on the server and independently 
of the client, by (1) determining differences between the 
current version of the data at the server and an older 
copy of the data at the client, which the server has 
stored locally; (2) using the differences to construct an 

15 update for the copy of the data, which may include node 
insertion and node deletion operations for hierarchically 
organized nodes in the data; and (3) sending the update 
to the client where the update is applied to the copy of 
the data to produce an updated copy of the data. 

20 According to one aspect of the present invention, the 
act of determining differences, and the act of using the 
differences to construct the update both take place dur- 
ing a single pass through the data. According to another 
aspect of the present invention, the update for the copy 

25 of the data may include node copy, node move, node 
collapse, node split, node swap and node update oper- 
ations. 



BRIEF DESCRIPTION OF THE FIGURES 
[0008] 

FIG. 1 illustrates a computer system including a 
web browser and a web server in accordance with 
an embodiment of the present invention. 
FIG. 2 illustrates a computer system including a 
server that automatically updates local copies of 
documents in accordance with another embodi- 
merrt of the present invention. 
FIG. 3 is a flow chart illustrating how a client 
requests an update from a server in accordance 
with an embodiment of the present invention. 
FIG. 4 is a flow chart illustrating how a server auto- 
matically updates local copies of documents in 
accordance with an embodiment of the present 
invention. 

FIG. 5 is a flow chart illustrating how the system 
creates updates for a new copy of hierarchically 
structured data in accordance with an embodiment 
of the present invention. 

FIGs. 6A-6I illustrate the steps involved in creating 
updates to transform a document tree T1 into a 
document tree T2. 



55 DETAILED DESCRIPTION 

[0009] The following description is presented to 
enable any person skilled in the art to make and use the 
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invention, and is provided in the context of a particular 
application and its requirements. Various modifications 
to the disclosed embodiments will be readily apparent to 
those skilled in the art. and the general principles 
defined herein may be applied to other embodiments 
and applications without departing from the spirit and 
scope of the present invention. Thus, the present inven- 
tion is not intended to be limited to the embodiments 
shown, but is to be accorded the widest scope consist- 
ent with the principles and features disclosed herein. 
[0010] The data structures and code described in 
this detailed description are typically stored on a com- 
puter readable storage medium, which may be any 
device or medium that can store code and/or data for 
use by a computer system. This includes, but is not lim- 
ited to, magnetic and optical storage devices such as 
disk drives, magnetic tape. CDs (conrtpact discs) and 
DVDs (digital video discs), and computer instruction sig- 
nals embodied in a carrier wave. For example, the car- 
rier wave may carry information across a 
communications network, such as the Internet. 

Computer System 

[0011] FIG. 1 illustrates a computer system includ- 
ing a web browser and a web server in accordance with 
an embodiment of the present invention. In the illus- 
trated embodiment, network 102 couples together 
server 104 and client 106. Network 102 generally refers 
to any type of wire or wireless link between computers, 
including, but not limited to, a local area network, a wide 
area network, or a combination of networks. In one 
embodiment of the present invention, network 102 
includes the Internet. Server 104 maybe any node cou- 
pled to network 102 that includes a mechanism for sen^- 
idng requests from a client for computational or data 
storage resources. Client 106 may be any node coupled 
to network 102 that includes a mechanism for request- 
ing computational or data storage resources from 
server 1 04. 

[0012] Server 104 contains web server 112, which 
stores data for at least one web sire in the form of inter- 
linked pages of textual and graphical information. Web 
server 112 additionally includes a mechanism to create 
updates for remotely cached copies of data from web 
server 112, 

[001 3] Web sender 1 1 2 stores textual and graphical 
information related to various websites in document 
database 1 16. Document database 116 may exist in a 
number of locations and in a number of forms. In one 
embodiment of the present invention, database 116 
resides within the same computer system as sewer 104. 
In another embodiment, document database resides at 
a remote location, and is accessed by sewer 104 
through network 102. Note that portions of document 
database 1 16 may reside in volatile or non-volatile sem- 
iconductor memory Alternatively, portions of document 
database 116 may reside within rotating storage 



devices containing magnetic, optical or magneto-optical 
storage media. 

[001 4] Client 1 06 includes web browser 1 1 4, which 
allows a user 110 viewing display 108 to navigate 

5 through various websites coupled to network 102. Web 
browser 114 stores cached copies 118 of portions of 
website documents in local storage on client 106. 
[0015] During operation the system illustrated in 
FIG. 1 operates generally as follows. In communicating 

10 with web browser 1 1 4, user 1 1 0 generates an access to 
a document in w^ sewer 112, In processing the 
access, web browser 1 1 4 first examines cached copies 
1 18 to determine if the access is directed to a portion of 
a web document that is already cached within client 

15 106. If so, client 106 makes an update request 120, 
which is transferred across network 102 to sewer 104. 
In response to the request, sewer 104 generates an 
update 122. which Is transferred to web browser 114. 
Update 122 is then applied to the cached copies 1 18 in 

20 order to update cached copies 118. Finally, the access 
is allowed to proceed on the cached copies 1 18. 
[001 6] Note that although the example illustrated in 
FIG, 1 deals with web documents for use with web 
browsers and web sewers, in general the present inven- 

25 tion can be applied to any type of data. This may include 
data stored in a hierarchical database. This may also 
include data related to a directory service that supports 
a hierarchical name space. 

[0017] Also, server 104 and web server 112 may 
30 actually be a proxy server that stores data in transit 
between a web server and web browser 114. In this 
case, the invention operates on communications 
between the proxy server and web browser 1 1 4. 
[0018] In a variation on the embodiment illustrated 
35 in FIG. 1 , client 1 06 is a "thin client" with limited memory 
space for storing cached copies of documents 118. In 
this variation, when client 106 requests a document, 
only a subset of the document that client 1 06 is actually 
viewing sent from server 104 to client 106. This subset 
40 is adaptively updated as client 106 navigates through 
the document. 

[0019] In another variation on the above embodi- 
ment, documents from document database 116 are 
tree-structured. In this variation, documents or portions 

45 of documents that are sent from server 1 04 to client 1 06 
are first validated to ensure that they specify a proper 
tree structure before they are sent to client 106. This 
eliminates the need for client 106 to validate the data. 
(Validation is typically performed by parsing the data, 

50 constructing a tree from the data, and validating that tiie 
tree is properly structured.) Reducing this work on the 
client side can be particularly useful for thin clients, 
which may lack computing resources for performing 
such validation operations. 

55 [0020] FIG. 2 illustrates a computer system includ- 
ing a server tiiat automatically updates local copies of 
documents in accordance with another embodiment of 
ttie present invention. In the embodiment illustrated in 
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FIG. 2, network 202 couples together server 204 with 
workstation 206, personal computer 208, network com- 
puter 210 and personal organizer 212. Network 202 
generally refers to any type of wire or wireless link 
between computers, including, but not limited to, a local 
area networK a wide area network, or a combination of 
networks. In one embodiment of the present invention, 
network 202 includes the Internet. Server 204 may be 
any node coupled to network 202 that includes a mech- 
anism for sen/Icing requests from a client for computa- 
tional or data storage resources. Server 204 
communicates with a number of clients, including work- 
station 206, personal computer 208. network computer 
210 and personal organizer 212. In general, a client 
may include any node coupled to network 202 that con- 
tains a mechanism for requesting computational or data 
storage resources from server 204. Note that network 
conputer 210 and personal organizer 212 are both "thin 
clients." because they have rely on servers, such as 
server 204 for data storage and computational 
resources. Personal organizer 212 refers to any of a 
class of portable personal organizers containing com- 
putational and memory resources. For example, per- 
sonal organizer 212 might be a PALMPILOT™ 
distributed by tiie 3COM Corporation of Sunnyvale, Cal- 
ifornia. (PalmPilot is a ti-ademark of the 3COM Corpora- 
tion). 

[0021] In the illustrated embodiment, workstation 

206, personal computer 208, network conoputer 210 
and p^sonal organizer 212 contain cached documents 

207, 209, 211 and 213, respectively Cached docu- 
ments 207, 209, 211 and 213 contain locally cached 
portions of documents from server 204. 

[0022] Server 204 is coupled to document database 
214, which includes documents to be distributed to cli- 
ents 206, 208, 210 and 212. Document database 214 
may exist in a number of locations and in a number of 
forms. In one embodiment of tiie present invention, doc- 
ument database 214 resides within the same computer 
system as server 204, In anotiier embodiment, docu- 
ment database resides at a remote location tiiat is 
accessed by server 204 across network 202. Portions of 
document database 214 may reside in volatile or non- 
volatile semiconductor memory Alternatively, portions 
of document database 214 may reside within rotating 
storage devices containing magnetic, optical or magne- 
tooptical storage media. 

[0023] Server 204 includes publishing code 205, 
which includes computer code that disseminates infor- 
mation across network 202 to workstation 206, personal 
computer 208. network computer 210 and personal 
organizer 212. Publishing code 205 includes a mecha- 
nism that automatically creates updates for locally 
cached copies of documents from document database 
214 stored in clients 206, 208, 210 and 212. 
[0024] During operation, tiie system illustrated in 
FIG. 2 operates generally as follows. Publishing code 
205 periodically receives new content 230, and uses 



new content 230 to update documents witiiin document 
database 214. Publishing code also periodically con- 
structs updates for remotely cached copies of docu- 
ments from document database 214. and sends these 
5 updates to clients, such as workstation 206, personal 
computer 208, network computer 210 and personal 
organizer 212. Note that tiiese updates do not simply 
contain new versions of cached documents, but rather 
specify changes to cached documents. 



10 



Updating Process 



[0025] FIG. 3 is a flow chart illustrating how a client 
requests an update from a server in accordance with an 

75 embodiment of tiie present invention. This flow chart 
describes tiie operation of tiie invention witii reference 
to tiie embodiment illustrated in FIG. 1. Rrst, tiie system 
receives a request access the data (step 302). In FIG. 1 , 
tiiis corresponds to user 110 requesting access to a 

20 web page or a portion of a web page through web 
browser 1 14 on client 106. Next, the system determines 
if dient 1 06 contains a copy of tiie data (step 304). This 
corresponds to web browser 1 1 4 looking in cached cop- 
ies 1 18 for tiie requested data. If the data is not present 

25 on client 106, tiie system simply sends a copy of the 
requested data from server 104 to client 106 (step 306), 
and this copy is stored in cached documents 118. 
[0026] If a copy of tiie data is present on client 1 06, 
client 106 sends an update request 120 to server 104 

30 requesting an update to the copy (step 308). In one 
embodiment of the present invention, update request 
120 includes a time stanp indicating how long ago the 
previous update to cached documents 1 18 was created. 
In response to update request 120. server 104 deter- 

35 mines differences between tiie copy of the data on client 
106, and the data from document database 116 (step 
310). These differences are used to construct an update 
122. which specifies operations to update tiie copy of 
tiie data on client 106 (step 312). Note that if client 106 

40 sends a timestamp along witii tiie request in step 308, 
tiie timestamp can be used to determine the differences 
between ttie data on server 1 04 and the cached copy of 
tiie data on client 106. In another embodiment of tiie 
present invention, server 104 saves update 122, so that 

45 server 104 can send update 122 to otiier clients. In yet 
another embodiment, server 104 keeps track of 
changes to tiie data from document database 1 16 as 
the changes occur; these changes are aggregated into 
update 122. This eliminates the need to actually find dif- 

50 ferences between the data from document database 
1 16 and tiie cached copy of tiie data on client 106. 
[0027] Also note tiiat the operations specified by 
update 122 may include manipulations of nodes with in 
tiie data. For example, if tiie data is hierarchically organ- 

55 ized as nodes in a tree structure, tiie update may spec- 
ify tree node manipulation operations, such as move, 
swap, copy, insert and delete operations for leaf nodes. 
The data may also specify sub-tree move, copy, swap, 
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insert, delete operations, as well as internal node split- 
ting and internal node collapsing operations. Transmit- 
ting such node manipulation operations, instead of 
transmitting the data that results after the node manipu- 
lation operations have been applied to the data, can 
greatly reduce the amount of data that must be transmit- 
ted to update a copy of the data on client 106. 
[0028] The update may additionally include a Multi- 
purpose Internet Mail Extensions (MIME) content type 
specifying that the update contains updating operations 
for hierarchically organized data. This informs a client 
receiving update 122 that update 122 contains update 
information, and not regular data. The MIME content 
type may specify that update 122 contains updating 
information that has been validated by sewer 104 so 
that client 106 does not have to validate update 122. 
[0029] In one embodiment of the present invention, 
the steps of determining the differences (step 310) and 
constructing the update 122 (step 312) take place con- 
currently during a single pass through the data. This 
technique has performance advantages ova perform- 
ing tiiese steps separately in two passes tiirough the 
data. 

[0030] Next, update 122 is sent from sewer 104 to 
client 106 (step 314), and client 106 applies update 122 
to the copy of the data (step 31 6). In one embodiment of 
the present invention, the copy of the data is stored in 
semiconductor memory within client 106, and hence 
applying update 1 22 to the copy of the data involves fast 
memory operations, instead of slower disk access oper- 
ations. 

[0031] Finally, the original access to the data (from 
step 302) is allowed to proceed, so that user 110 can 
view the data on display 108. The above process is 
repeated for successive accesses to the cq^y of the 
data on client. 

[0032] Note that although the illustrated embodi- 
ment of tiie present invention operates in the context of 
a web browser and a web server, the present invention 
can be applied in any context where updates to data 
have to be propagated to copies of the data. For exam- 
ple, the present invention can be applied to distributed 
database systems. 

[0033] FIG. 4 is a flow chart illustrating how sewer 
204 (from FIG. 2) automatically updates local copies of 
documents in accordance witii an embodiment of tiie 
present invention. This embodiment is an implementa- 
tion of a "push" model, in which data is pushed from a 
sewer 204 to clients 206, 208, 210 and 212 without the 
clients having to ask for the data. This differs from a 
"request" model, in which the clients have to explicitly 
request data before it is sent as is illustrated in FIG. 1 . 
[0034] The flow chart illustrated in FIG. 4 describes 
the operation of tiie invention witii reference to the 
embodiment illustrated in FIG. 2. First, sewer 204 
receives new content 230 (step 402). This new content 
230 may take tiie form of live updates to document data- 
base 214. for example in the form of stock pricing infor- 



mation. New content 230 is used to update documents 
or otiier data objects witiiin document database 214 on 
server 204 (step 404). 

[0035] Next, publishing code 205 within sewer 204 
5 determines differences between the data in document 
database 214 and copies of the data on clients (sub- 
scribers) 206. 208, 210 and 212, (step 406). These dif- 
ferences are used to construct updates 216, 218, 220 
and 222, which specify operations to change copies of 
10 tiie data on clients 206, 208. 210 and 212. respectively 
(step 408). 

[0036] Updates 216. 218, 220 and 222 may specify 
operations tiiat manipulate nodes within the data. For 
example, if the data is hierarchically organized as nodes 

15 in a tree structure, updates 216. 218, 220 and 222 may 
^ecify ti'ee node manipulation operations, such as 
move, swap, copy, insert and delete operations for leaf 
nodes. The data may also specify sub-tree move, copy, 
swap, insert, delete operations, as well as internal node 

20 splitting and internal node collapsing operations. Trans- 
mitting such node manipulation operations, instead of 
transmitting tiie data after node manipulation operations 
have been applied to it. can greatiy reduce tiie amount 
of data tiiat must be transmitted to update copies of tiie 

25 data on clients 206, 208. 210 and 212. 

[0037] In one embodiment of tiie present invention, 
the steps of determining tiie differences (step 406) and 
of constructing updates 216, 218, 220 and 222 (step 
408) takes place concun-ently during a single pass 

30 through tiie data. This can have a significant perform- 
ance advantage over performing tiiese steps in two sep- 
arate passes through tiie data. 
[0038] Next, updates 216. 218, 220 and 222 are 
sent from server 204 to clients 206, 208, 210 and 212 

35 (step 410), respectively. Clients 206, 208, 210 and 212 
apply updates 216, 218, 220 and 222 to tiieir local cop- 
ies of tiie data 207, 209, 21 1 and 21 3, respectively (step 
412). In one embodiment of tiie present invention, these 
updates are applied to are applied the local copies 207, 

40 209. 211 and 213 "in memory." witiiout requiring disk 
accesses. This allows to updates to be performed very 
rapidly. 

[0039] The above process is periodically repeated 
by tiie system in order to keep copies of the data on cli- 
45 ents 206. 208, 210 and 212 at least partially consistent 
with the data on server 204. This updating process may 
repeated at any time interval from, for example, several 
seconds to many days. 

50 Process of Creating Updates 

[0040] FIG. 5 is a flow chart illustrating how the sys- 
tem CTeates updates at the server for a new copy of hier- 
archically structured data in accordance witii an 
55 embodiment of tiie present invention. This emtxjdiment 
assumes that the data is hierarchically organized as a 
collection of nodes in a tree structure. This tree struc- 
ture includes a root node that can have a number of chil- 
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dren. These children can also have children, and so on, 
until leaf nodes, which have no children, are reached. 
Note that the beiow-deschbed process for creating 
updates requires only a single pass through the data. 
During this single pass the system determines differ- 5 
ences between old and new trees and creates corre- 
sponding updates to convert the old tree into the new 
tree. This eliminates the need for a separate time-con- 
suming pass through the data to create updates from 
differences. 10 
[0041] The system starts with an old tree (oldj) 
and a new tree (new_t). The system first matches leaf 
nodes of oldJ and newj (step 502). In doing so. the 
system may look for exact matches or partial matches of 
the data stored in the leaf nodes. In tiie case of partial is 
matches, if the quality of a match is determined to be 
above a preset threshold, the leaf nodes are considered 
to be "matched." Next, the system generates deletion 
operations to remove nodes from old_t which are not 
present in new J (step 504), 20 
[0042] In the next phase, the system repeats a 
number of steps (506, 508, 510 and 512) for ascending 
levels of the tree. First, for a given level, the system gen- 
erates node insertion operations for nodes that are 
present In newJ but not in oldJ (step 506). Also, if the 25 
position of a node in oldJ is different from the position 
of the same node in new_t, the system generates a 
move operation, to move the node from its position in 
OldJ to its new position in newJ (step 508). Addition- 
ally, if a parent node in old_t does not have all of the 30 
same children in newJ, the system generates a node 
split operation for the parent, splitting the parent node 
into a first parent and a second parent (step 510). The 
first parent inherits all of the children that are present in 
newJ, and the second parent inherits the remaining 35 
children. If a parent node in old_t has all of the same 
children and additional children in new_t, the system 
generates a node collapse operation to bring ail the chil- 
dren together in new J (step 512). 
[0043] Additionally, if all of the children of a first par- 40 
ent in oldJ move to a second parent in newJ, the sys- 
tem generates a node collapse operation to collapse the 
first parent into the second parent so that all of the chil- 
dren of the first parent are inherited by the second par- 
ent. 45 
[0044] The system repeats the above-listed steps 
506, 508. 510 and 512 until the root of the ti^ee is 
reached. At this point all of the operations that have 
been generated are assembled togetiier to create an 
update that transforms oldJ into newJ (step 51 4). so 

[0045] Let us consider the example tree illusti^ated 
in Figure 6A. This tree may represent a document con- 55 
sisting of sections, paragraphs and individual sentences 
containing parsable character data. Assume that the 
document grammar also allows documents to contain 



non-character data, say numeric data, as is represented 
by the leaf node identifier 'd*. All nodes in FIG. 6A 
include a name (tag), a value, and an associated value 
identifier. Since the leaf nodes actually contain data, 
value identifiers are assigned to them before the proc- 
ess starts; whereas, for an internal node, a value identi- 
fier is assigned during tine comparison process based 
upon the value of identifiers of the Internal node's chil- 
dren. Note that in some embodiments of the present 
invention, the tree data structure as represented in 
memory may conform to the World Wide Web Consor- 
tium document object model (WSCDOfVI). 
[0046] Additionally, in some embodiments of the 
present invention, the hierarchically organized data 
includes data that conforms to the Extensible Markup 
Language (XML) standard. In other embodiments of the 
present invention, the hierarchically organized data 
includes data that conforms HyperText Markup Lan- 
guage (HTML) standard, and other markup language 
standards. 

Notatlonal Semantics 

[0047] We represent each leaf node by the patii 
from root node to the leaf node containing the position 
of each node along tiie path. Hence, the notation for 
each of the leaf nodes in FIG. 6A is as follows: 

DO.SeO.PO.SO (left-most node) 

DO.SeO.PO.S1 

D0.Se0.P0.S2 

D0.Se0.P0.S3 

DO.SeO.P1 .SO 

D0.Se1.N0 

D0.Se2.P0.S0 

D0.Se2.P1.S0 

D0.Se2.P1.S1 

D0.Se2.P2.S0 

D0.Se2.P2.S1 (right-most node) 

The above notation is used to locate and represent any 
node in the tree, whether it be a leaf node or internal 
node. 

[0048] The notational semantics for each of the tree 
transformation operations is as follows: 

* MOV(D0.Se0.P0.S2. D0.Se2.P1.S0). In FIG. 6A, 
this operation moves the leaf node with value iden- 
tifier 'a'. Note that a similar operation can be used to 
represent a movement of an internal node. In the 
case of an internal node, the entire sub-tree moves. 
Thus, the movement of an individual node or a sub- 
tree can be an inter-parent move or an intra-parent 
move. 

* SWP(D0.Se0.P0.S2. D0.Se0.PO.S1). This opera- 
tion is permitted only in the case of nodes that 
share a common parent (i.e., intra-parent only). The 
operation swaps the position of the affected nodes, 
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under the common parent In the case of internal 
nodes, entire sub-trees are swapped. 

* CPY(DO.SeO.PO. D0.Se2.P2). This operation repli- 
cates a node by making an identical copy on the 
node. In the case of internal nodes, the entire sub- s 
tree Is copied. 

* INS(DO.SeO.PO.SO, a', {data}). This operation 
inserts a node in the tree at the given position and 
assigns to it a value identifier a' along with the 
{data}. In the case of an internal node, 
{data}assign6d contains a null value. 

* DEL(DO.SeO.PO) This operation deletes a node and 
ail of its children. 

* SPT(DO.SeO.PO. I) This operation splits a parent 
node into a first node and a second node. All of the 
children of the parent node starting at position I are 
transferred to the first node. The remaining children 
are transferred to the second node. The first node 
gets the same tag type as the original parent node. 

* CLP(DO.SeO.PO. DO.SeO.P1). This operation col- 
lapses the contents of a first node and a second 
node. The resulting node gets the same tag type as 
the first node. The children of the second become 
the right-most children of ttie resulting node. 

* UPD(D0.Se0.P0.S2. {delta}). This operation speci- 
fies a change{delta}to the contents of a leaf node. 
The {delta} itself describes how to apply (or merge) 
the change. 

[0049] Th example described below generates a 
set of operations to transform an old tree T1 (FIG. 6A) 
into a new tree T2 (FIG. 6B). Note that in this example 
the leaf nodes contain actual data, and the internal 
nodes simply contain tags which organize and describe 
the data. There are three phases in the process, includ- 
ing: (1) matching the leaf nodes in T1 and T2; (2) delet- 
ing nodes in T1 with no match in T2; and (3) modifying 
or moving nodes the remaining nodes to create T1 . 

Phase 1 : Matching Leaf Nodes 

[0050] The first step is to generate a unique identi- 
fier for each of the leaf nodes in T2 based on the content 
of the leaf node. This can be accomplished by using a 
hash function to generate a unique identifier for each of 
the leaf nodes. If two leaf nodes have tiie same content, 
then the hash function generates the same identifier. If 
two leaf nodes have the same identifier, it will not cause 
problems, because the process uses tiie root node to 
leaf node patii to identify the individual nodes. 
[0051 ] Next, the process assigns value identifiers to 
leaf nodes of T1 . For a given leaf node in T1 , the proc- 
ess uses a hash function to generate a unique identifier, 
which matches one of tiie leaf node identifiers in T2. If 
the identifier generated does not match any of tiie iden- 
tifiers in T2, then process attempts to find a closest 
matching leaf node in T2, based on some matching cri- 
teria. For example, the process may use the Longest 



Common Sub-sequence (LCS) algoritiim ("Data Struc- 
tures and Algorithms," Aho, Alfred V., Hopcroft, John E, 
and Ullman, Jeffrey D., Addison-Wesley, 1983, pp. 189- 
194) to determine a percentage match between tiie 
contents of leaf nodes in T1 and T2. The matching crite- 
rion can be flexible. For example, the matching criterion 
may specify a minimum of 30% commonality in order for 
tiie leaf nodes to be matched. 
[0052] Allowing matches to be made on an accept- 
able matching criteria provides a measure of flexibility. 
In case a given leaf node's content has been only 
slightiy modified in going from T1 to T2. the system sim- 
ply matches tiie node witii its modified version in T2. 
TTie process subsequentiy makes the leaf nodes con- 
sistent through tiie UPD(node, delta) operation. How- 
ever, if tiie commonality between leaf nodes being 
matched does not satisfy tiie matching criterion, the 
process assigns a unique value identifier to the leaf 
node in T1 , which indicates that the leaf node has been 
deleted. 

[0053] In the worst case, the time conrtplexity of find- 
ing a match between tiie leaf nodes will be O(K^). where 
K is the number of unique leaf node identifiers in T1 and 
T2. In the best case, where the leaf nodes in T1 and T2 
match in a straightfonward manner, tiie complexity will 
be 2*K. However, the number of changes in a document 
from one version to another is typically fairly small, in 
which case only a few leaf nodes need to be matched 
based on the weak matching criteria. 

Phase 2: Deletion phase 

[0054] After the matching phase is complete, tiiere 
may be some leaf nodes in T1, which are not matched 
to nodes in T2. These unmatched are deleted as fol- 
lows. 

* For unmatched leaf nodes in T1 (from left to right), 
create a delete operation, such as 
DEL(D0.Se2.P2.S0). 

* Reduce tiie number of delete operations, by replac- 
ing tiiem with sub-tree delete operations, if possi- 
ble, tf all children belonging to a parent are to be 
deleted, the delete operation of each of the children 
can be replaced by a single delete operation of the 
parent node. This involves scanning tiie deletion 
list, looking for common parents. If T1 has K levels, 
at most K-1 scans are needed to identify a common 
parent for deletion. Notice that while scanning the 
rtii level, unreduced nodes in the i-h1 level can be 
ignored, since they cannot be further reduced. 
After the reductions are performed, the final dele- 
tion list is repositioned, because deleting a node at 
position '0* alters tiie relative positioning of adjacent 
nodes. Hence, if two delete operations are to be 
performed on nodes that have a common parent, 
tiien the second delete operation needs to be 
altered to reflect tiie change in position of the see- 
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ond node to be deleted. 

[0055] in the instant exanrtple, leaf nodes y, t, h and 
i in FIG. 6A are unmatched. In accordance with the first 
step, the system creates following delete operations, 

DEL(DaSeO.PO.SO), 
DEL(DaSeO.PO.SI), 
DEL{DaSe2.P2.S0). 
DEL{D0.Se2.P2,S1). 

In the second step, scanning left to right (scan level 4), 
the system notices that all of D0.Se2.P2's children are 
to be deleted. By reducing the individual delete opera- 
tions '*DEL(D0.Se2.P2.S0)" and '*DEL(D0.Se2.P2.S1)" 
into a single delete operation of the parent 
''DEL(D0.Se2.P2)" we are left the following delete oper- 
ations. 

DEL(DO.SeO.PO.SO), 
DEL(DO.SeO.PO.SI). 
DEL(D0.Se2.P2). 



internal node is established by the collective identity of 
its children. For example, if a parent node's children are 
identified as 'a* and 'b' respectively, then the identity of 
the parent is 'ab.' 

5 [0060] Also, if a parent node is left with no children 
as a result of a move operation, the parent node is 
deleted. Furthermore, in the special case where there is 
a skewed tree or sub-tree of nodes having just one 
child, i.e., a->b->c->d, when node *d' Is deleted, node 'c' 

70 is also be deleted. This action is repeated until node 'a' 
is deleted as well. Instead of generating an individual 
delete operation for each one of the nodes, the chain of 
delete operations is reduced to a single delete operation 
of the grandest common parent of all nodes being 

15 deleted. 

[0061] Pseudo-code for one embodiment of the 
modification phase appears below. 

For each level J in T2 (leaf to the root) { 

20 

1, TO_BE_COMPLETED_LIST = list of all the 
node value identifiers at levelj in T2. 



[0056] Continuing with the level 3 scan, the system 
notices that the only eligible delete operation for reduc- 
tion is DEL(D0.Se2.P2). since the other delete opera- 
tions DEUDO.SeO.PO.SO) and DEL(DO.SeO.PO.SI) are 
at level 4. Since D0.Se2.P2's parent has other children 
which do not participate in the delete operation, the 
reduction ends at scan level 3. 
[0057] In the third step, the system checks to see if 
applying the first delete operation will affect the relative 
node position of any other delete operation. This 
involves looking Ibr nodes having the same parent as 
the node being deleted. If such a node exists, the sys- 
tem adjusts its node position accordingly. Note that the 
entire deletion list need not be scanned to identify sib- 
ling nodes, because the inherent ordering in the dele- 
tion list ensures that deletion operations for sibling 
nodes will be close together in the deletion list. 
[0058] Continuing with the example, the system 
notices that applying the delete operation 
DEL(DO.SeO.PO.yO) will affect the relative positioning of 
sibling node DO.SeO.PO.t1. So, the system adjusts the 
position of its sibling (See FIG. 6C). Hence, the final 
deletion list becomes, 

DEL(DaSeO.PO.SO). 
DEL(DaSeO.PO.SO), 
DEL(D0.Se2.P2). 

Phase 3: Modification Phase 



2. If the node in the 
25 TO_BE_COMPLETED_LIST is the root node, 

find the matching node f in T1 If T happens is 
a root node, break from the loop. Else, partition 
TV into two nodes, such that the sub-tree 
rooted at 1' is moved away from TV, and 
30 becomes another tree (T1"). Next, delete the 

source partition (Tl*) by deleting its grandest 
common parent (the root). T1 " and T2 are now 
identical. 

35 } (end of for loop) 

3. Pick one of the nodes 'k' from 
TO_BE_COMPLETED_LIST, typically the left- 
most node. SIBLING^LIST = siblings of V, 

40 including 'k'. Note that we use the term 'node' in 

place of a node identifier, for convenience. 

4. If none of the nodes in the SIBLING_LIST 
have a matching node in TV, create a parent 

45 node 'p' in T1 having the same tag type as the 

one in T2 (i.e. same as the parent of the nodes 
in the sibling list in T2). Insert all of the nodes in 
the sibling list into the newly created parent 
node in T1 Next, move the newly created node 

50 (along with its children) to be the child of any 

internal node, preferably, one of the parent 
nodes at levelJ-2, if such a level exists. 



[0059] The modification phase brings together the 
children of internal nodes, in a bottom-up fashion. This 
involves scanning all the nodes from the bottom-most 
level (furthest from the root), and scanning each level 
until level zero is reached. Note that the identity of each 



5. Let S be the subset of nodes in the 
55 SI BLING_LIST that have a match in TV. Find a 

parent node 'p' in TV, which has the most sib- 
lings in the S1BLING_L1ST 
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Move the rest of the matched nodes in S. 
to be the children of 'p\ If any subset of 
nodes being moved have a common par- 
ent 'q*. and if 'q' has no other children, then 
collapse 'q' into 'p'. Else, IndividLial nodes 
are moved by separate move operations. 
The unmatched nodes in the 
SIBLING_LIST are inserted into 'p*. 
. Order the children of 'p' through swap 
operations. At the end of the swaps, all the 
children of 'p' which do not happen to be 
the children of Its peer, if any, are gathered 
in the right-most corner. If there are such 
children, then a node split operation Is per- 
formed, so that 'p' has exactly the same 
children as its peer. The newly aeated 
node (sub-tree) is at the same level as 'p' 
and has the same parent as 'p*. Also, the 
tag type of 'p' is changed to be the same as 
Its peer in T2, If rt is different. 

6. Assign a node identity to 'p', which is the col- 
lective identity of its latest children. Similarly, 
assign an identical identity to the peer node of 
'p' in T2. 

7. TO_BE,COMPLETED.LIST 
T0_BE„C0MPLETED_LIST-SIBL1NG LIST 

8. If TO BE COMPLETED LIST is not equal to 
NULL, then return to step 2, else continue. 

[0062] Note that the above node movement opera- 
tions cause changes in the relative positioning of sibling 
nodes. Hence, the node operations generated by the 
process should take into account the positional changes 
caused by node movements. 

[0063] The system now applies the modification 
algorithm on TV from FIG 60. 

Level 3 scan 

[0064] AppI yi ng steps 1 and 2 . 
TO_BE_COMPLETED_LIST = {g, c, f, e. b, a. z} and 
SIBLING_LIST = {g. c). The system locates the children 
'g' and 'c' in TV. and chooses D0.Se2.P1 to be the par- 
ent. Applying step 4, the system notices that nodes 
D0.Se2.P1 and DO.SeO.P1 need to be collapsed. This 
brings together all the nodes in the S1BL1NG_L1ST 
under a common parent (See FIG. 6D). 

CLP(D0.Se2.P1, D0.Se0.P1) 

[0065] Next, the system uses swap operations to 
re-ofder the nodes (see FIG. BE), 

SWP(D0.Se2.P1.S1. D0.Se2.P1.S0) 
SWP(D0.Se2.P1.S2, D0.Se2.P1.S1) 



16 

[0066] Next, a split operation is performed to move 
away children which do not truly belong to the parent 
(see FIG. 6F) 



[0067] Applying step 5, the system generates an 
identity for D0.Se2.P1 and its peer in T2 (see FIG. 6G). 
Though T2 is not shown, it is assumed that the identity 

10 has been assigned 

[0068] Applying step 6. the system determines that 
TO_BE_COMPLETED_LIST = {f. e. b, a. z}. Since 
TO_BE_COMPLETED_LIST is not empty, the system 
returns to step 2. SIBLING _LIST = {f}. Step 3 and 4 do 

75 not produce any changes. Step 5 assigns an identity to 
D0.Se2.P2. Step 6 removes T from 
■rc)_BE„COMPLETED_LIST Repeating the same, the 
system eliminates *d'. and 'e' from 
TO_BE_COMPLETED_LIST 

20 [0069] At this point. TO_BE_COMPLETED_LIST =: 
{b. a, z} and SIBLING_L1ST = {b. a. z}. Step 4 selects 
node DO.SeO.PO as a matching node. At this point, node 
'z' In the SIBLING_L1ST is unmatched in T2, Hence, the 
system inserts node 'z'. Next, the system applies swap 

25 operations to order the children of DO.SeO.PO. Now, 
T0_BE_COMPLETED LIST is NULL (see FIG. 6H). 

INS(D0.Se0.P0.S0, z. {data)) 
SWP(D0.Se0.P0.S2, DO.SeO.PO.SO) 

30 

Lev^l 2 scan 

[0070] Applying steps 1 and 2 the system deter- 
mines T0_BE_C0MPLETED_L1ST = {gc, f, e, baz} and 
35 SIBLING_LIST = {gc. f}. Applying step 4, the system 
chooses D0.Se2 as the parent. The system next applies 
swap operations to order the children, and then split the 
parent D0.Se2 to move away children that do not belong 
to D0.Se2. Applying step 5, the system generates Iden- 
40 titles for D0.Se2 and its peer in T2 (see FIG. 61). 

SWP(D0.Se2.P0, D0.Se2.P1) 
SWP(D0.Se2.P1. D0.Se2.P2) 
SPT(D0.Se2, 2) 

45 

[0071] Now, TO_BE_COMPLETED_ LIST = {e, 
baz} and S1BLING_LIST = {e, baz}. Applying step 4, the 
system chooses DO.SeO as the parent. Since, P(e) is 
the only child, the system collapses DO.SeO and 
50 DO.SeS, and then re-orders the children through swap 
operations. Applying step 5, the system generates iden- 
tities for DO.SeO and its peer in T2 (see FIG. 6J). 
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Level 1 scan 

[0072] Applying steps 1 and 2 the system deter- 
mines TO_BE_COMPLETED_LIST = {ebaz. d. gcf} and 
SIBLING_LIST = {ebaz, d. gcf}. Step 4 selects DO as 
the parent, and applies the swap operations to re-order 
its children, which produces T2 (see FIG 6K). 

SWP{DO.SeO, DO.Sel) 

[0073] Hence, the final set of transformations to 
transform T1 to T2 is: 

DEL(DO.SeO.PO.SO). 
DEL(DO.SeOPO.SO), 
DEL(D0.Se2.P2), 
CLP(D0.Se2.P1. DO.SeO.P1), 
SWP(D0.Se2,P1.S1, D0.Se2.P1.S0). 
SWP(D0.Se2.P1.S2. D0.Se2.P1.S1). 
SPT(D0.Se2.P1.2). 
INS(DO.Se0.PO.S0. z. {data}). 
SWP(D0.Se0P0.S2, DO.SeOPO.SO), 
SWP(D0.Se2P0. D0.Se2P1), 
SWP(D0.Se2P1. D0.Se2P2), 
SPT(D0.Se2.2). 
CLP(DO.SeO. D0.Se3). 
SWP(DO.SeOPO. DO.SeOPI), and 
SWP(DO.SeO, DO.Sel). 

[0074] Additionally, if partial matches of leaf nodes 
were made, the leaf nodes need to be updated using 
UPD operations. 

[0075] The above process requires all nodes in T2 
be visited and matched with corresponding nodes in T1 
once. The complexity of matching the Internal nodes is 
0(n1+n2), where n1 and n2 are the internal node 
counts of T1 arxJ T2, respectively. Note that nodes can 
be matched by hashing node value identifiers. 
[0076] Node movements and modifications also 
add to the overhead. If we consider a cost-based analy- 
sis, the cost of a transformation operation on a node T 
is a function of the number of children of Y. Thus, the 
net cost of ail transformations will be a function of the 
total number of nodes involved directly or Indirectly in 
the transformation, 

[0077] Since there are no cycles in the transforming 
operations, the overhead contributed by the node move- 
ments is bounded by 0(LK). where L is the number of 
levels in the tree, and K is a the number of leaf nodes. 
However, typically the number of nodes involved in the 
movements is very small and does not involve all the 
nodes in a tree. 

[0078] Hence, the worst case time complexity of the 
algorithm is a summation of the cost of matching leaf 
nodes O(K^), the cost of matching internal nodes 
0(n1+n2), and overhead contributed by node move- 
ments 0(LK). In an average case analysis, where the 
number of changes to a document are less than, for 
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example, 20%, the time complexity is a summation of, 
the cost of matching leaf nodes 0(K), the cost of match- 
ing internal nodes 0(n1+n2), and overhead contributed 
by node movements 0(K). 

5 

Optimizations 

[0079] There exist a number of additional optimiza- 
tions that can be applied to the above process, 

10 

* While trying to find a parent 'p' in T1 ' which has the 
most children in the SIBLING_LIST. if there is tie, 
choose a parent with the same tag-type as the one 
inT2, 

75 * While re-ordering nodes within the same parent 
(intra-node movement) through swap operations, if 
the node being moved out is not in the 
SIBL1NG_LIST. it can be directly moved to be the 
right-most child. 

20 * While re-ordering nodes within the same parent 
(intra-node movement) through swap operations, if 
the node being moved out is in the SIBLING_L1ST, 
try to position the node being moved out through 
another swap operation. 

25 

[0080] The foregoing descriptions of embodiments 
of the invention have been presented for purposes of 
illustration and description only. They are not intended 
to be exhaustive or to limit the invention to the forms dis- 
30 closed. Many modifications and variations will be appar- 
ent to practitioners skilled in the art Accordingly, the 
above disclosure is not intended to limit the invention. 
The scope of the invention is defined by the appended 
claims. 

35 

Ciaims 

1 . A method for propagating changes in hierarchically 
organized data located on a server (204) to a copy 
40 off the data (207. 209, 21 1 . 213) located on a client 
(206, 208, 210. 212), comprising: 

receiving (402) the changes (230) to the data 
(214) on the server (204); 
45 applying (404) the changes to the data (2 1 4) on 

the server (204); and 

responsively to an event on the server and 
independently of the client, propagating the 
changes to the copy of the data by, 

50 

determining differences (406) between the 
data (214) on the server (204) and the 
copy of the data which the server has 
locally available. 
55 using the differences to construct (408) an 

update (216. 218. 220. 222) for the copy of 
the data which the server has locally avail- 
able, wherein the update may include node 
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insertion and node deletion operations for 
hierarchically organized nodes in the data, 
and 

sending (410) the update (216. 218. 220. 
222) from the server (204) to the client 
(206, 208,210, 212). 

2. The method of claim 1 , wherein the acts of deter- 
mining (406) the differences and constructing (408) 
the update take place during a single pass through 
the data. 

3. The method of claim 1 or claim 2, wherein the event 
on the server includes a timer completing a prepro- 
grammed time interval. 

4. The method of any one of claims 1 to 3, wherein the 
event on the server includes a change to the data 
located on the server. 

5. The method of any one of claims 1 to 4, wherein the 
act of propagating the changes to the copy of the 
data on the client takes place automatically at regu- 
lar time intervals or at irregular time intervals. 

6. The method of any one of claims 1 to 5. wherein the 
act of determining the differences (406) involves 
aggregating the changes to the data and/or exam- 
ining the data after the changes have been applied 
to the data. 

7. The method of any one of claims 1 to 6. wherein the 
update (216. 218, 220. 222) includes a Multipur- 
pose Internet Mail Extensions (MIME) content type 
specifying that the update (216, 218. 220, 222) con- 
tains updating operations for hierarchically organ- 
ized data. 

8. The method of any one of claims 1 to 7, wherein the 
update may include one or more of the following: 

a) a node copy operation that makes an identi- 
cal copy of a node as well as any subtree of the 
node that may exist, 

b) a node move operation that moves a node to 
another location in a tree of hierarchically 
organized nodes, 

c) a node spirt operation that splits a node into 
a pair of nodes, and divides any children of the 
node that may exist between the pair of nodes, 

d) a node collapse operation that collapses a 
pair of nodes into a single node, which inherits 
any children of the pair of nodes that may exist. 

e) a node deletion operation that includes 
deleting any nodes that are subordinate to the 
node, 

f) a node swap operation that swaps two nodes 
as well as any subtrees of the nodes that may 



exist, 

g) a node update operation. 

9. The method of any one of claims 1 to 8, wherein the 
5 data that is hierarchically organized includes data 
that conforms to the HyperText Markup Language 
(HTML) standard or the Extensible Markup Lan- 
guage (XML) standard. 

TO 10. The method of any one of claims 1 to 9. wherein the 
data that is hierarchically organized includes a hier- 
archical database, and/or a directory service that 
supports a hierarchical name space. 

15 11. The method of any one of claims 1 to 10, wherein 
the copy of the data located on the client (206. 208. 
210.212) contains a subset of the data (214) on the 
server (204). 

20 12. The method of any one of claims 1 to 1 1 , wherein 
the server (204) includes a proxy server for caching 
data in transit between a server (204) and a client 
(206. 208,210,212). 

25 13. The method of any one of claims 1 to 12, wherein 
the update (216. 218, 220. 222) includes data that 
is validated at the server (204). 

14. A computer readable storage medium storing 
30 instructions that when executed by a computer 

cause the computer to perform the method of any 
one of claims 1 to 13. 

15. An apparatus that propagates changes in data 
35 located on a server to a copy of the data located on 

a client, comprising: 

a receiving mechanism (205) that receives 
(402) the changes to the data on the server; 
40 a change application mechanism (205) that 

applies the changes (404) to the data on the 
server; 

a difference determining mechanism that 
determines differences (406) between the data 

45 on the server and the copy of the data which 

the server has locally available, wherein the dif- 
ference determining mechanism operates 
responsively to an event on the server and 
independently of the client; 

50 an update creaton mechanism that constructs 

(408) an update for the copy of the data, 
wherein the update may include node insertion 
and node deletion operations for hierarchically 
organized nodes in the data; and 

55 an update sending mechanism, that sends 

(410) the update from the server to the client. 

16. The apparatus of claim 15, wherein the difference 
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determining mechanism and the update creation 
mechanism operate concurrently during a single 
pass through the data. 

1 7. The apparatus of claim 1 5 or daim 1 6, further com- 
prising an updating mechanism on the client that 
applies the update to the copy of the data to pro- 
duce an updated copy of the data. 



server. 

27. The method of any one of claims 22 to 26, wherein 
the act of applying the update to the copy of the 
5 data takes place in semiconductor memory, 
whereby the update is able to proceed rapidly in the 
absence of time-consuming 1/0 operations. 



18. The apparatus of any one of claims 15 to 17, io 
wherein the update additionally includes at least 
one from the group of node move, node collapse, 
node split and node update operations. 

19. The apparatus of any one of claims 15 to 18, is 
wherein the update includes a Multipurpose Inter- 
net Mail Extensions (MIME) content type specifying 
that the update contains updating operations for 
hierarchically organized data. 

20 

20. The apparatus of any one of claims 15 to 19, 
wherein the copy of the data located on the client 
(206, 208, 210, 212) contains a subset of the data 
on the server (204). 

25 

21. The apparatus of any one of claims 15 to 20, 
wherein the update Includes data that is validated 
at the server (204). 

22. A method for propagating changes In data located 30 
on a server to a copy of the data located on a client, 
comprising: 

receiving (410) at the client an update (216, 
218, 220. 222) for the copy of the data from the 35 
server (204), wherein the update may include 
node insertion and node deletion operations for 
hierarchically organized nodes in the data; 
applying (412) the update to the copy of the 
data to produce an updated copy of the data. 4o 

23. The method of claim 22. wherein the update addi- 
tionally includes at least one from the group of, 
node move, node collapse, node split and node 
update operations. 45 

24. The method of claim 22 or claim 23, wherein the 
update includes a Multipurpose Internet Mail Exten- 
sions (MIME) content type specifying that the 
update contains updating operations for hierarchi- so 
cally organized data. 

25. The method of any one of claims 22 to 24, wherein 
the copy of the data located on the client contains a 
subset of the data on the server. ss 

26. The method of any one of claims 22 to 25, wherein 
the update includes data that is validated at the 



12 



EP1 016 986 A2 



NETWORK 
102 



SERVER 
104 



WEB SERVER 
(INCLUDING 
MECHANISM 
TO CREATE 
UPDATES) 
112 



UPDATE REQUEST 
120 



UPDATE 
122 



DOCUMENT 
DATABASE 
116 




USER 
110 



CACHED COPIES 
OF DOCUMENTS 
118 



FIG.1 



NEW 
CONTENT 
230 

1 



NETWORK 
202 



SERVER 
I 204 
JL 



PUBUSHING 

CODE 
(INCLUDING 
MECHANISM 
TO CREATE) 
205 



UPDATE 

216 . 



WORKSTATION 
206 



CACHED DOCUMENTS 
207 



UPDATE 
218 



UPDATE 
_ 220 



PERSONAL COMPUTER 
208 



CACHED DOCUMENTS 
209 



DOCUMENT 
DATABASE 
214 



UPDATE 
222 



NETWORK COMPUTER 
210 



CACHED DOCUMENTS 
211 



PERSONAL ORGANIZER 
212 



CACHED DOCUMENTS 
213 



FIG. 2 



13 



EP1 016 9a6A2 



r START \ 
V 300 J 



RECEIVE ACCESS TO 
DATA 
302 



(START A 



SERVER RECEIVES NEW 
DATA CONTENT 
402 



DETERMINE 
HAS COP> 
3( 


E IF CLIENT 
^OF DATA 
)4 




YES 

r 



SEND REQUEST TO 
SERVER FOR UPDATE 
TO COPY 
308 



DETERMINE DIFFERENCE 
BETWEEN DATA ON 
SERVER AND COPY OF 
DATA ON CLIENT 
310 



USE DIFFERENCES TO 
CONSTRUCT UPDATE 
FOR COPY ON CLIENT 
312 



NO 



SEND UPDAT 

31 


E TO CLIENT 
14 






APPLY UPDATE TO COPY 
OF DATA ON CLIENT 
316 



SEND COPY OF 
DATA TO CLIENT 
306 



SERVER USES NEW 
CONTENT TO UPDATE 
DATA 
404 



DETERMINE 
DIFFERENCES 
BETWEEN DATA ON 
SERVER AND REMOTE 
COPY OF DATA ON 
SUBSCRIBER 
406 



USE DIFFERENCES TO 
CONSTRUCT UPDATE 
FOR REMOTE COPY OF 
THE DATA ON 
SUBSCRIBER 
408 



\ 


r 


SEND UPDATES TO 
SUBSCRIBERS 
410 







ALLOW ACCESS TO 
DATA TO PROCEED 
318 



APPLY UPDATES TO 
COPIES OF DATA ON 
SUBSCRIBERS 
412 



/" END \ ( END \ 

320 J ^ 414 J 



FIG. 3 



FIG. 4 



14 



EP 1016 986 A2 



C START \ 



MATCH LEAF NODES OF OLD_T 
AND NEW_T 
502 



GENERATE DELETE OPERATIONS 
TO GET RID OF NODES IN OLD_T 
WITH NO MATCH IN NEW.t" 
504 



T 



GENERATE INSERT OPERATIONS 
FOR NODES IN NEW_T THAT ARE 
NOT IN OLD_T 
506 



IF A POSITION OF A NODE IN 
OLD_T IS DIFFERENT THAN A NODE 
IN NEW_T. GENERATE A MOVE 
OPERATION 
508 



IF A PARENT NODE IN OLD_T DOES 

NOT HAVE ALL OF THE SAME 
CHILDREN IN NEW T. GENERATE A 
SPLIT OPERATION 
510 



IF A PARENT NODE IN OLD T HAS 
ALL OF THE SAME CHILDREN AND 
ADDITIONAL CHILDREN IN NEW T. 
GENERATE A COLLAPSE 
OPERATION 
512 



REPEAT FOR 
ASCENDING 
LEVELS OF 
TREE STARTING 
AT NODES 
FURTHEST 
FROM THE 
ROOT OF TREE 



ASSEMBLE OPERTIONS INTO AN 
UPDATE 
514 



T 



C END \ 



FIG. 5 



15 



EP1 016g86A2 



D_ 

/ I 

/ I 

Se N 

/ \ (d) 

P P 

/ / / \ I 

s s s s s 

(y) (t) (a) (b) (c) 



\ 

\ 

Se 

/ \ \ 
P P P 
I / \ / \ 

s s s s s 

(e) (f) (g) (h) (i) 



Figure 6A: Sample Document Tree (Tl) 



D 

/ I \ 
/ I \ 

Se N Se 
/ \ I / \ 
P P (d) P P 



/ \ I I / \ \ 

s s s s s s s 

(g) (c) (f) (e) (b) (a) (z) 
Figure 6B: Modified E)ocument Tree (T2) 



D 

/ I \ 
/ I \ 

Se N Se 
/ \ (d) / \ 
P P P P 

/ \ I I / \ 

S S S S S s 

(a) (b) (c) (e)(f) (g) 

Figure 6C: Document Tree (Tl") after deletion phase 



16 



EP1 016 986 A2 



D 

/ r 
/ I 

Se N 
/ (d) 

P 

/ \ 
S S 
(a) (b) 



\ 

\ 

Se 
/ \ 

P P 

I / \ \ 
S S S S 
(e)(f) (g) (c) 



FIG. 6D 



/ I \ 
/ I \ 
Se N Se 
/ (d) / \ 

P P P 

/ \ I / \ \ 

S S S S S S 

(a) (b) (e) (g) (c) (f) 



FIG. 6E 



D 

/ I \ 
/ I \ 

Se N Se 

/ (d) / \ \ 

P P P P 

/ \ I / \ \ 

S S S S S S 

(a) (b) (e) (g) (c) (f) 

FIG. 6F 



17 



EP1 016 986 A2 



D 



/ 

Se 



\ 

\ 



/ 

P 

/ \ 



N 

(d) 



/ \ \ 

P P(gc) P 
I / \ \ 

s s s s 

(e) (g) (c) (f) 



S S 
(a) (b) 



FIG. 6G 



D 

/ I \ 

/ I \ 

Se N(d) Se 

/ (d) / \ \ 

P(baz) P(e) P(gc) P{f) 

/ / \ I / \ \ 

S S S S S S S 

(b) (a) (2) (e) (g) (c) (f) 



FIG. 6H 



D 



/ 

/ 
Se 



N 

(d) 



Se(gcf ) 

\ \ 
P(gc) P(f) 
/ \ \ 
S S S 

(g) (c) (f) 



\ 

\ 



/ 

P(baz) 



/ / \ 
S S S 
(b) (a) (z) 



FIG. 61 



18 



EP 1016 986 A2 



/ I \ 

/ I \ 

Se(ebaz) N Se(gcf)_ 

/ (d) \ \ 

) P(baz) P(gc) P(f) 

/ / \ / \ \ 

s s s s s s 

(b) (a) (z) (g) (c) (f) 

FIG. 6J 



D 

/ I \ 
/ I \ 
Se N Se 
/ \ I / \ 

P P (d) P 

/ \ I I / \ \ 

S S S S S S S 

(g) (c) (f ) (e) (b) (a) (2) 

FIG. 6K 



19 



